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ABSTRACT 

We describe an exact, flexible, and computationally efficient algorithm for a joint estimation 
of the large-scale structure and its power- spectrum, building on a Gibbs sampling framework 
and present its implementation ARES (Algorithm for REconstruction and Sampling). ARES 
is designed to reconstruct the 3D power- spectrum together with the underlying dark matter 
density field in a Bayesian framework, under the reasonable assumption that the long wave- 
length Fourier components are Gaussian distributed. As a result ARES does not only provide 
a single estimate but samples from the joint posterior of the power- spectrum and density field 
conditional on a set of observations. This enables us to calculate any desired statistical sum- 
mary, in particular we are able to provide joint uncertainty estimates. We apply our method to 
mock catalogs, with highly structured observational masks and selection functions, in order 
to demonstrate its ability to reconstruct the power- spectrum from real data sets, while fully 
accounting for any mask induced mode coupling. 

Key words: large scale - reconstruction -Bayesian inference - cosmology - observations - 
methods - numerical 



1 INTRODUCTION 

Throughout cosmic history a wealth of information on the origin 
and evolution of our Universe has been imprinted to the large scale 
structure via the gravitational amplification of primordial density 
perturbations. Harvesting this information from probes of the large 
scale structure, such as large galaxy surveys, therefore is an im- 
portant scientific task to further our knowledge and to establish 
a conclusive cosmological picture. In recent years great advances 
have been made, both in retrieving huge amounts of data and in- 
creasing sensitivity in galaxy redshift surveys. Especially the re- 
cent galaxy surveys, the 2dF Galaxy Reds hift Survey (IColless et ah] 
|2001| ) and the Sloan Digital Sky Survey ( |Abazajian et al.|2008| ) pn> 
vide sufficient redshifts to probe the 3D galaxy distribution on large 
scales. In particular, the two point statistics of the matter distri- 
bution contains valuable information to test standard inflation and 
cosmological models, which describe the origin and evolution of all 
observed structure in the Universe. Measuring the power- spectrum 
from galaxy observations therefore has always attracted great in- 
terest. Precise determination of the overall shape of the power- 
spectrum can for instance place important constraints on neutrino 
masses, help to identify the primordial power- spectrum, and break 
degeneracies for cosmological parameter estimation from CMB 



data (e.g.|Hu et al.||1998, Spergel et al. 2003, Hannestad 


2003 


Efstathiou et al. 


2002, Percival et al. 2002, Spergel et al. 


2003 


Verde et al.|2003 


). In addition, several characteristic length scales 



have been imprinted to the matter distribution throughout cosmic 
history, which can serve as new standard rulers to measure the Uni- 



verse. A prominent example of these length scales is the sound hori- 
zon, which yields oscillatory features in the power- spectrum, the so 
called baryon acoustic oscillations (BAO) (e.g. Silk 1968, Peebles 
& Yu 1970, Sunyaev & Zeldovich 1970 ). Since the physics govern- 
ing these oscillatory features is well understood, precise measure- 
ments of the BAO will allow us to establish a new, precise standard 
ruler to measure the Universe through the distance redshift rela- 
tion ( |Blake & Glazebrook|2003l|Seo & Eisenstein|2003| >. Precision 
analysis of large scale structure data therefore is a crucial step in 
developing a conclusive cosmological theory. 

Unfortunately, contact between theory and observations can- 
not be made directly, since observational data is subject to a variety 
of systematic effects and statistical uncertainties. Such systemat- 
ics and uncertainties arise either from the observational strategy or 
are due to intrinsic clustering behavior of the galaxy sample itself 
( Sanc hez & Cole|2008| ). Some of the most prominent uncertainties 
and systematics are: 

• survey geometry and selection effects 

• close pair incompleteness due to fiber collisions 

• galaxy biases 

• redshift space distortions 

The details of galaxy clustering, and how galaxies trace the un- 
derlying density field are in general very complicated. The bias 
between galaxies and mass density is most likely non-linear and 
stochastical, so that the estimated galaxy spectrum is expected to 
differ from that of the mass (Dekel & Lahav 1999). Even in the 
limit where a linear bias could be employed, it still differs for dif- 
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ferent classes of galaxies (see e.g. Cole et al. 2005 ). In addition, 
the apparent density field, obtained from redshift surveys, will gen- 
erally be distorted along the line-of- sight due to the existence of 
peculiar velocities. 

However, the main cause for the systematic uncertainties in 
large scale power- spectrum estimations is the treatment of the sur- 
vey geometry (Tegmark 1995 ; Ballinger et al. 1995 ). Due to the sur- 
vey geometry the raw power- spectrum yields an expectation value 
for the power- spectrum, which is the true cosmic power- spectrum 



convolved with the survey mask ( [Cole et aL[2 005 ). This convolu- 
tion causes an overall distortion of the power- spectrum shape, and 
drastically decreases the visibility of the baryonic features. 

The problems, mentioned above, have been discussed ex- 
tensively in literature, and many different approaches to power- 
spectrum analysis have been proposed. Some of the main tech- 
niques to recover the power- spectrum from galaxy surveys are 
Fourier transform based, such as the optimal weighting scheme, 
which assigns a weight to the galaxy fluctuation field, in order to 
reduce the error in the estimated power (see e.g. Feldman et al. 
|Tegmark|1995| |Hamilton| 1997a} |Yamamoto|2003| |PercivaT| 



1994 



et al. 



2004 ). Alternative methods rely on Karhunen-Loeve decom- 



positions ( [Tegmark et aT| l997 ; Tegmark & et al.|20 04, Pop e et al.| 
2004 ) or decompositions into spherical harmonics, which is espe- 
cially suited to address the redshift space distortions problematic 



(|Fisher et al.|1994||Heavens & Taylor| 19951 |Tadros e t al. 1999||Per] 
|cival et al" 2004 ; Percival 2005 ). In addition, to these deconvolution 
methods there exists a variety of likelihood methods to estimate 
the real space power- spec trum (jBallinger et al.||1995] [Hamilton 
1997a b ; Tadros et al. 1999 ; Percival 2005}. In order to not just pro- 
vide the maximum likelihood estimate but also conditional errors, 
Percival (2005 ) proposed a Markov Chain Monte Carlo method to 
map out the likelihood surface. 

Nevertheless, as the precision of large scale structure exper- 
iments has improved, the requirement on the control and charac- 
terization of systematic effects, as discussed above, also steadily 
increases. It is of critical importance to propagate properly the un- 
certainties caused by these effects through to the matter power- 
spectrum and cosmological parameters estimates, in order to not 
underestimate the final uncertainties and thereby draw incorrect 
conclusions on the cosmological model. 

We therefore felt inspired to develope a new Bayesian ap- 
proach to extract information on the two point statistics from a 
given large scale structure dataset. We prefer Bayesian methods 
to conventional likelihood methods, as the yield more general and 
profound statements about measurements. In example, conven- 
tional likelihood methods can only answer questions of the inner 
form like :" Given the true value s of a signal, what is the proba- 
bility distribution of the measured values dV A Bayesian method, 
on the other hand, answers questions of the type :"Given the ob- 
servations d, what is the probability distribution of the true un- 
derlying signal sT For this reason, Bayesian statistics answers the 
underlying question to every measurement problem, of how to es- 
timate the true value of the signal from observations, while con- 
ventional likelihood methods do not (Michel & Kirchhoff 1999 ). 
Since the result of any Bayesian method is a complete probabil- 
ity distribution they permit fully global analyses, taking into ac- 
count all systematic effects and statistical uncertainties. In particu- 
lar, here, we aim at evaluating the power- spectrum posterior distri- 
bution P ({P(ki)}\{di}), with P(ki) being the power- spectrum coeffi- 
cients of the kfth mode and d { - d(x t ) is an observation at position 
Xi in three dimensional configuration space. This probability distri- 
bution would then contain all information on the two point statistics 



supported by the data. In order to explore this posterior distribution 
we employ a Gibbs sampling method, previously applied to CMB 
data analysis (see e.g. |Wandelt||2004[ |Eriksen et aL]|2004[ |Jewell| 
|et al.|2004) . 

Since direct sampling from P ({P(ki)}\{di}) is impossible or 
at least difficult, they propose instead to draw samples from the 
full joint posterior distribution P({P(ki)},{Si}\{di}) of the power- 
spectrum coefficients P(ki) and the 3D matter density contrast am- 
plitudes Si conditional on a given set of data points {dj. This is 
achieved by iteratively drawing density samples from a Wiener- 
posterior distribution and power- spectrum samples via an efficient 
Gibbs sampling scheme (see figure [TJfor an illustration). Here, ar- 
tificial mode coupling, as introduced by survey geometry and se- 
lection function, is resolved by solving the Wiener-filtering equa- 
tion, which naturally regularizes inversions of the observational re- 
sponse operator in unobserved regions. In this fashion, we obtain a 
set of Monte Carlo samples from the joint posterior, which allows 
us to compute any desired property of the joint posterior density, 
with the accuracy only limited by the sample size. In particular, 
we obtain the power spectrum posterior P ({Pft)}|{^}) by simply 
marginalizing the joint posterior P ({P(^)}, over the auxil- 

iary density amplitudes su which is trivially achieved by ignoring 
the Si samples. 

The Gibbs sampler also offers unique capabilities for propa- 
gating systematic uncertainties end-to-end. Any effect, for which 
we can define a sampling algorithm, either jointly with or condi- 
tionally on other quantities, can be propagated seamlessly through 
to the final posterior. 

It is worth noting, that our method differs from traditional 
methods of analyzing galaxy surveys in a fundamental aspect. Tra- 
ditional methods consider the analysis task as a set of steps, each 
of which arrives at intermediate outputs which are then fed as in- 
puts to the next step in the pipeline. Our approach is a truly global 
analysis, in the sense that the statistics of all science products are 
computed jointly, respecting and exploiting the full statistical de- 
pendence structure between various components. 

In this paper we present ARES (Algorithm for REconstruction 
and Sampling), a computer algorithm to perform a full Bayesian 
data analysis of 3D redshift surveys. In section[3]we give an intro- 
duction to the general idea of the large scale structure Gibbs sam- 
pling approach, followed by section[4]and[5] where we describe and 
derive in detail the necessary ingredients to sample the 3D den- 
sity distribution and the power- spectrum respectively. The choice 
of the prior and the relevance for the cosmic variance are discussed 
in section [6] Details concerning the numerical implementation are 
discussed in section [7] We then test ARES thouroughly in section 
[8] particularly focussing on the treatment of survey masks and se- 
lection functions. In section [9] we demonstrate the running median 
filter, and use it as an example to demonstrate how uncertainties can 
be propagated to all inferences based on the set of Gibbs samples. 
Finally we conclude in section [10] by discussing the results of the 
method and giving an outlook for future extensions and application 
of our method. 



2 NOTATION 

In this section, we describe the basic notation used throughout this 
work. Let the quantity p t = p{Xi) be the field amplitude of the three 
dimensional field p(x) at position x t . Then the index i has to be 
understood as a multi index, which labels the three components of 
the position vector: 
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j5 = [xj,j£j£], (1) 

where x\ is the 7th component of the z'th position vector. Alterna- 
tively one can understand the index i as a set of three indices {r, s, 
so that for an equidistant grid along the three axes the position vec- 
tor can be expressed as: 

*i = x r , s ,t = [Ax r, Ay s, Az t] , (2) 

with Ax, Ay and Az being the grid spacing along the three axes. 
With this definition we yield: 

Pi = Pr,s,t ■ (3) 

Also note that any summation running over the multi index i is 
defined as the three sums over the three indices r, s and t: 

i r s t 

Further, we will frequently use the notation {p t }, which denotes the 
set of field amplitudes at different positions x t . In particular: 

{pi\ = {po,Pl,P2,-,PN-l}> (5) 

where N is the total number of position vectors. 



4 SAMPLING THE SIGNAL MAPS 

Assuming a Gaussian signal posterior distribution 
P({si}\{P(ki)},{di}), the task of drawing a random signal sam- 
ple can be split into two steps. 

First, we estimate the maximum a postiori values for the signal 
amplitudes s t , which in the Gaussian case coincide with the mean 
values. Then a fluctuation term, being consistent with the correct 
covariance, is added to the mean signal. The sum of the mean and 
the fluctuation term will then yield a sample from the conditional 
posterior. 

The most challenging procedure in this signal sampling step 
is to calculate the a postiori values for the signal amplitudes s t . As- 
suming a Gaussian posterior will directly lead to a Wiener filtering 
procedure, described below. However, this method requires to in- 
vert huge matrices which consists of the sum of the inverse signal 

5 and inverse noise N covariance matrices. The matrix inversion is 
a numerically very demanding step, and at the same time presents 
the bottleneck for our method, as it defines the computational speed 
with which a signal sample can be produced. The efficient imple- 
mentation of these matrix inversion step, as described by Kitaura 

6 EnBlin (2008 ), allows for the production of many thousands of 
samples in computational feasible times. In the following sections 
we will describe the details of the signal sampling procedure. 



3 THE LARGE SCALE STRUCTURE GIBBS SAMPLER 

As already described in the introduction, we seek to sample from 
the joint posterior distribution P({P(ki)},{Si}\{di}) of the power- 
spectrum coefficients P(k t ) and the 3D matter density contrast am- 
plitudes Si given a set of observations {^ }. 

In principle, this joint posterior distribution could be mapped 
out over a grid in the multi-dimensional space of the signal ampli- 
tudes Si and power- spectrum coefficients P(ki). But since the num- 
ber of grid points required for such an analysis scales exponentially 
with the number of free parameters, this approach cannot be real- 
ized efficiently. For this reason, we propose a Gibbs sampling ap- 
proach to this problem. 

The theory of Gibb s sampling (Gelfand & Smith|1990[[TSri 
ner 1996 ; O'Hagan 2000 ) states, that if it is possible to sample from 
the conditional densities P({si)\{P(ki)}, {d t }) and P({P(ki)}\{si], K}), 
then iterating the following two sampling equations will, after 
an initial burn-in period, lead to samples from the joint posterior 
P({Si}AP(ki)}\{di}y. 



{Si} {j+l) ^P«Si}\{P(ki)^\{di}), 

{P(kdf +1) ^ P«P(ki)}\{Si} (j+1 \ {di}) , 



(6) 
(7) 



where the symbol denotes a random draw from the probability 
density on its right. 

Once a set of samples from P({si], {P(ki)}\{di}) has been ob- 
tained, the properties of this probability density can be summarized 
in terms of any preferred statistic, such as its multivariate mean, 
mode or variance. 

As our approach probes the joint distribution, we are able to 
quantify joint uncertainties of the signal amplitudes and the power- 
spectrum conditional just on the data. For this reason, the Gibbs 
sampling approach should not be considered as yet another maxi- 
mum likelihood technique, although it certainly is able to produce 
such an estimate. 

In the following we are going to describe the necessary meth- 
ods and procedures required for iterating the processes [6] and [7] of 
signal and power- spectrum sampling. 



4.1 The Wiener filter 

As already described above, the main task for the signal sampling 
step is to derive the maximum a postiori values for the signal ampli- 
tudes According to B ayes' theorem the conditional signal pos- 
terior can be written as the product of a signal prior and a likeli- 
hood normalized by the so called evidence. Further, here we will 
use the signal covariance matrix S rather than the power- spectrum 
{P(ki)}. It is well known, that the power- spectrum is just the Fourier 
transform of the signal covariance in configuration space. Since the 
Fourier transform is a basis transformation with a unitary trans- 
formation matrix, the signal covariance matrix S and the power- 
spectrum {P(ki)} can be used interchangeably for a normalized 
Fourier transform (see section |5.1| and Appendix [B] for more de- 
tails). 

We can therefore write the signal posterior as: 



P({ Si }\SAdi}) = 



P(S) 
P({di},S) 
1 



P({si}\S)P«di}\{silS) 



■P({si}\S)P«di}\{si}), 



P({di)\S)'—' (8) 

where we assume that the data amplitudes d t are conditionally in- 
dependent of the signal covariance matrix S , once the signal ampli- 
tudes Si are given. Following Bardeen et al. (1986 ), we describe the 
signal prior for the large scale matter distribution as a multivariate 
Gaussian, with zero mean and the signal covariance S . We can then 
write: 



P({Si)\S) 



1 



Vdet (2ttS) 



- \ Z/Z./ siSij 



(9) 



The Fourier transform of the signal covariance matrix S has an es- 
pecially appealing form in Fourier space. It is well known, that in 
an homogeneous and isotropic universe the Fourier transform of the 
signal covariance is a diagonal matrix, with the diagonal elements 
being the power- spectrum. Hence, we can express the Fourier rep- 
resentation of the signal covariance as: 



(10) 
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Wiener-mean 
+ 

Wiener-variance 



3D density sample 



power- spectrum sample 







Inverse Gamma 
sampling 

^ J 







Figure 1. Flow-chart depicting the two step iterative Gibbs sampling procedure. 



where the "-symbol denotes a Fourier transform, 8f. is the Kro- 
necker delta and P k = P(k k ) is the power- spectrum coeffici ent at 



Pad- 



the Fourier mode k k in three dimensional pixel space (see e.g 
|manabhan|1993l|Lahav & Suto|2004| ). 

The choice of the Gaussian prior can be justified by inflation- 
ary theories, which predict the matter field amplitudes to be Gaus- 
sian distributed in the linear regimes at scales k < 0. 15 h/Mpc ( |Pea-| 
|cock & Dodds|1994||Percival et al.|200l) . 

At nonlinear scales the Gaussian prior does not represent the 
full statistical behavior of the matter field anymore. During non- 
linear gravitational structure formation the statistics of the initial 
density field has evolved from a Gaussian distribution towards a 
log normal distribution as commonly assumed in literature ( Coles 
|& Jones|199T]|Colombi|1994l|Kayo et al.|2001) . 

However, note that in this case the Gaussian prior still de- 
scribes the two point statistics of the underlying density field even 
in the nonlinear regime. The Gaussian prior should therefore be re- 
garded as our a priori knowledge of the matter distribution, which 
is only formulated up to two point statistics. Next, we discuss the 
likelihood P({di}\{Si}) given in equation (8]). 

As we seek to recover the maximum a postiori signal s t from 
the set of observations d t we must assume a model which relates 
these both quantities. The most straight forward data model is lin- 
ear, and can be written as: 



di = ^ K ik s k + €i , 



(11) 



where is an observation response operator and e t is an additive 
noise contribution, which will be defined in more detail in the next 
section. If we assume the noise e f to be Gaussian distributed, with 
zero mean and covariance N, we can express the likelihood as: 

P({di}\{Si}) = 1 _ e -\(£iZj [di-Zk KiksfiNif 1 [d r 2i KflSi]) q 2 ) 

Vdet (2ttN) 

where we simply inserted the data model given in equation 
into the Gaussian noise distribution. 

With these definitions the signal posterior is a multivariate 
Gaussian distribution and can be written as: 

e - 2 (Li Lj SiSif 1 Sj+ [di-^k K ik s k ] Nif 1 [dj-Zt K fl Sl \) 



P({Si)\SAdi}) 



(13) 



where we omitted the normalization factor, which is of no interest 
in the following. 

Note, that omitting the normalization of the likelihood, re- 
quires that the additive noise term is independent of any signal con- 



tribution, as otherwise the noise covariance matrix would carry sig- 
nal information and could not be neglected in the following. This 
assumption, however, is in general not true for the Poissonian noise, 
as described, in the next section. 

The maximum of this signal posterior can then easily be found 
by either completing the square in the exponent of equation (\3\ , or 
by extremizing P({Si)\S, {di}) with respect to the signal amplitudes 
The latter approach allows us to directly read of the Wiener filter 
equation from equation |T3] >, by simply differentiating the exponent 
with respect to s t and setting the result to zero. The result is the 
famous Wiener filter equation which is given as: 

Z s u + Z Z k «" n ^ k 'j m J = Z Z K ™ N ^ d > ' (14) 

j l ml J ml 

where we denoted the variable m 7 as a Wiener mean signal ampli- 
tude, to clarify that this reconstruction is the mean and not a typical 
sample of the distribution. The solution of this equation requires to 
invert the matrix: 

D « = Sy 1 + 2Z*-^« I tl5) 

m I 

which leads to the solution for the signal amplitudes 

^Z^/ZZ^^'- (16) 

j m I 

This result demonstrates that estimating the maximum a postiori 
values mi for the signal amplitudes s t involves inversions of the 
Wiener filter operator D. Therefore, the signal- sampling operation 
is by far the most demanding step of our Gibbs sampling scheme, 
as it requires the solution of a very large linear system. 

Formally speaking, in practice, this corresponds to inverting 
matrices of order ~ 10 6 X 10 6 or larger, which clearly is not compu- 
tationally feasible through brute-force methods. For example, ma- 
trix inversion algorithms, based on usual linear algebra methods, 
have a numerically prohibitive 0(N 3 pix ) scaling, in order to trans- 
form to the eigenspace of the system, which bars sampling from 
the signal posterior. 

This is the situation in which Kitaura & EnBlin (2008 ) pro- 
posed a particular operator based inversion technique to allow for 
computationally efficient calculation of the Wiener filter equation 
in three dimensional space. 

In this implementation, the system of equations (T6] > can be 
solved by means of conjugate gradients (CGs). The computational 
scaling of this method is thus reduced to the most expensive step 
for applying the operator on the right-hand side of the equations, 
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which in our case is the Fast Fourier transform, which scales as 
0(N pix \og(N pix )) . 

4.2 The galaxy data model 

In order to adapt the Wiener filter procedure for the specific ap- 
plication to galaxy observations, we are going to present the galaxy 
data model together with the according Poissonian noise covariance 
matrix. 

It is possible to describe the observed galaxy distribution as a 
realization of an inhomogeneous Poissonian process (Martinez & 
|Saar|2 002). We can therefore assume the observed galaxy numbers 
N9 = N°(Xi) at position x t in three dimensional configuration space 
to be drawn from a Poissonian distribution (Martinez & Saar 2002 ; 
|Kitaura et al."l2009) . 



A?" 



-A 



N°l 



(17) 



where the arrow denotes a random draw from the probability dis- 
tribution and A? is the mean observable galaxy number at position 
x t . We can then write the observed galaxy numbers at discrete po- 
sitions as: 



Af = (N°) + e°=A° + e° 



(18) 



where the noise term e° denotes the difference between the ob- 
served galaxy number and the mean observable galaxy number. The 
Poissonian noise covariance matrix can then easily be obtained by: 

Nfj = (e? e°) = {[N°-(N°)W°-(N°)]) = 5fj(N°) = 6fjA° ,(19) 

where we simply calculated the Poissionian variance for the ob- 
served galaxy number assuming the galaxies to be independent and 
identically distributed (i.i.d). The mean observable galaxy number 
can be related to the true mean galaxy number A t by applying the 
observation response operator R^ as: 



(20) 



The true mean galaxy number, on the other hand, can be related 
to the dark matter over density field, the signal su by introducing 
a physical model in the form of a bias operator By, e.g. a scale 
dependent bias: 



(21) 



By inserting equations ( |20| and pT) into equation |T8] > and apply- 
ing trivial algebraic conversions, we yield the data model: 



(22) 



For the case of galaxy redshift surveys the response operator R t j is 
the product of the sky mask and the selection function, which are 
both local in configuration space, and hence the response operator 
turns to: 

R ij = 6f j M i F i , (23) 

where M t is the value of the sky mask and F t is the value of the se- 
lection function at position i. We therefore arrive at the data model 
already described in equation (TT| , which reads: 



Mi 



Sk + 



(24) 



where we introduced the effective observation response operator 
K t j = MiFiBij and the noise contribution e t = ef/A. This is the 
galaxy data model which we derived from the assumption of the 
Poissonian distribution of galaxies. 

The Wiener filter operator requires the definition of the noise 
covariance matrix N, which for the Poissonian noise can be ex- 
pressed as: 



<^> : 



<gff) 
A 2 



■ 6 K — , 

l] A 2 ' 



(25) 



where we used the Poissonian noise covariance matrix given in 
equation |T9] >. 

However, introducing equation ( |20| and $2\\ yields the noise 
covariance matrix: 



1 



2> l+ I> 



kl si 



(26) 



k v l 

which immediately reveals, that there is a correlation between the 
underlying signal amplitudes s t and the level of shot noise produced 
by the discrete distribution of galaxies (see e.g. Seljak 1998). 

Nevertheless, as pointed out in the previous section, the 
Wiener filter relies on the fact, that the additive noise contribution 
is uncorrelated with the signal. Hence, we have to assume the noise 
covariance as uncorrelated with the signal, but it may have some 
structure. 

Therefore, we provide two approaches to effectively approxi- 
mate the noise covariance matrix given in equation ( [26] ). 

In the first approach we calculate an effective noise covariance 
matrix by averaging the noise covariance matrix given in equation 
(26l over the signal. We then obtain: 



(Nij) s 
1 



k 



(Sl)s 



k 



(27) 



where we used the fact, that the ensemble mean of the signal am- 
plitudes for the density contrast vanishes. Note, that this model also 
arises when persuing a least squares approach to matter field recon- 
structions rather than the Bayesian approach as described in this 
work (for details see |Kitaura & EnBlin|2008] ). 

In the other approach we introduce a noise structure function 



F 

nr. given as: 



A 2 



(28) 



With this definition the noise is approximated as being uncorrelated 
to the signal, but nonuniform. The noise covariance matrix then 
reads: 



: Sfj n, 



(29) 

In order to use this noise structure function we have to estimate A? 
from the observed galaxy numbers N?. By applying Bayes' theo- 
rem to the Poissonian distribution given in relation ([39} we yield: 



P(N°\A°) 



P(N°) 



(30) 



In the absence of any further a priori knowledge on A? we assume 
a flat prior and search for the maximum of: 
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Figure 2. Slices through signal sample produced in one Gibbs sampling step. The left panels (a,d) show the Wiener filtered mean signal, panels (b,e) present 
the fluctuation term, and the right panels show the full, noiseless Gibbs sample. The color map encodes the amplitude of the density contrast. 



P(A?\N?) : 



T(N9 + 1) ' 



(31) 



which is normalized to yield unity when integrated over all A?. 

The noise structure function n? F can then be estimated by 
searching the most likely value for A? from equation (31 1. This 
yields: 



N° 



(32) 



Another estimator for rf F is based on evaluating the mean of the 
probability distribution given in equation ( [37] ). The ensemble mean 
is calculated as: 



Jo 



dA»Ar 



,A° N ?e-*? 



Y(N° + 1) 

in this case the noise structure function n^ F can be written as: 
N° + l 



A 2 



(33) 



(34) 



A more thorough discussion on Poissonian noise models and their 
numerical implications for matter field reconstructions can be 
found in |Kitaura et 31(2009) . 



4.3 Drawing signal samples 

In the previous sections, we described the Wiener filter and the 
galaxy data model, which are required to estimate the mean of 



the signal posterior. However, this mean signal is no sample from 
the signal posterior yet, neither does it represent a physical density 
field, as it lacks power in the low signal to noise regions. To create 
a true sample from the signal posterior, one must therefore add a 
fluctuation term y u which compensates the power lost due to noise 
filtering. The signal sample can then be written as the sum of the 
signal mean and a fluctuation term: 



Si = m l + y t . 



(35) 



In our approach we realize the fluctuation term by generating a 
mock signal s* and a mock observation d* consistent with the 
data model given in equation ( |24| ). This kind of mock observa- 
tion generation is well known in literature and has been applied 
to various scientific applications, as for instance the generation 
of constrained initial conditions for Nbody simulations (see e.g. 
|Bertschinger[[T987l |Hoffman & Ribak|[T9^T| [Ganon & Hoffman] 
1993 ; Kitaura & EnBlin 2008). The fluctuation term can then sim- 
ply be calculated as: 



(36) 



The interpretation of this equation is simple. In the high signal to 
noise regime, the Wiener filter is nearly a pass-through operator, 
meaning the reconstructed signal is nearly identical to the true un- 
derlying signal. Therefore, as the variance is low, the fluctuation 
term tends towards zero. In the low signal to noise regime, on the 
other hand, the Wiener filter will block, and no signal can be recon- 
structed. The fluctuation term will therefore be nearly identical to 
the underlying mock signal s* . 
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In this fashion we add the correct power to the Wiener mean 
reconstruction. The effect of adding the fluctuation term to the 
Wiener mean is presented in figure [2] where we see the Wiener 
mean reconstruction, the fluctuation term and the sum of both. 

The mock data d* is generated to obey the data model de- 
scribed in equation ( [24} and the Wiener variance. 

We therefore first draw a mock signal s* with correct statistics 
from the multivariate Gaussian signal prior given in equation j5|. 
Such a mock signal is best generated in Fourier space following the 
description of Martel (2005). One first draws two Gaussian random 
numbers, Xa and Xb* with zero mean and unit variance and then 
calculates the real and imaginary part of the signal in Fourier space 
as: 



(37) 



where P k is the power- spectrum coefficient at the Mi position in 
Fourier space. Note, that the mock signal s* is supposed to be a 
real quantity, and therefore hermiticity has to be imposed in Fourier 
space before performing the inverse Fourier transform (for details 
see |Martel|2005| >. 

Next we have to generate the additive noise contribution. In 
order to draw a noise term with the correct Poissonian statistics, we 
first draw a random number N* from the Poissonian distribution: 




N* ^P(N*\A*), 



(38) 



where we choose the mean observed galaxy number to be A* = 
n^l 2 . According to equations {l8| and p4| the mock noise term 
e* can be calculated as: 



N* - n s /l 2 



(39) 



It is clear by construction that this mock noise term has vanishing 
mean and the correct noise covariance matrix. Then, according to 
equation ([24| the mock observation is given as: 



(40) 



The proof, that the fluctuation term y t as generated by equation ( [36| 
truly generates the correct variance is given in Appendix [C] 

Note, that the application of the Wiener operator is a linear 
operation, and we can therefore rewrite equation (35) as: 



(41) 



where the Wiener operator is applied to the true data d t and the 
mock observation d* simultaneously. This greately reduces the 
CPU time required for the generation of one signal sample. 



5 SAMPLING THE POWER SPECTRUM 

As described above, the signal sampling step provides a noise-less 
full sky signal sample s t consistent with the data. The next step 
in the Gibbs sampling iteration scheme requires to draw power- 
spectrum samples from the conditional probability distribution 
P(S\{si),{di}). Since in this Gibbs sampling step the perfect sky 
signal amplitudes s t are known, the power- spectrum is condition- 
ally independent of the data amplitudes d t . Hence, in this Gibbs 



sampling step, we can sample the power- spectrum from the proba- 
bility distribution P(S\{Si}). In the following we will show that the 
power- spectrum can easily be drawn from an inverse gamma distri- 
bution. 



5.1 Drawing power-spectrum samples 

According to B ayes' theorem, we can rewrite the conditional prob- 
ability P(S\{si}) as: 

P(S) 



P(S\{ Si }): 



zn{si)\s), 



(42) 



nisi)) 

where P(S) is the prior for the signal covariance, P({Si}\S) is given 
by equation ([9} and P({si}) is a normalization constant in this Gibbs 
sampling step. 

More specifically, we are interested in the set of matrix co- 
efficients {Sij} of the covariance matrix S. As already pointed 
out in section |4.1| the signal covariance matrix of an homoge- 
neous and isotropic universe, has an especially appealing form in 
Fourier space, where it takes a diagonal form. In our application the 
real space covariance matrix coefficients {Sij} are related to their 
Fourier representation via the fast Fourier transform, as defined in 
Appendix [A] We can therefore write: 



= C 



N-l N-l 

k=0 1=0 
N-l N-l 

2 EE* 

k=0 1=0 



,2n^(ik-jl)§ k ^ 



(43) 



Then we can express the conditional distribution for the Fourier 
signal covariance coefficients S ki as: 



^({^«}|{Ji}) = ^({5y}|{Ji}) 



d{S t 



d{S k 



where 



= |detC7 ( 



(ij)(kl)J 



(44) 



(45) 



d{S k 

is the Jacobian determinant for this coordinate transformation. As 
the discrete Fourier transform is proportional to a unitary matrix, 
this Jacobian determinant only amounts to a normalization con- 
stant, as has been demonstrated in Appendix [B] 

With this definition we can rewrite the conditional probability 
in equation ( [42] ), by replacing all the real space covariance matrix 
coefficients by their Fourier representation S kh and normalizing 
it with the constant obtained from the coordinate transformation. 
We can therefore write: 

n{hi)\{si}) = ^i^\ msi}\{hi}) 



C N2 P« Si }) 



C Nl Jdet 



_C±_ yN-1 yN-l ? * f 1 ? 
-e 2C 2 L k=0 ^1=0 S k^kl s l 



(46) 



where we used the discrete Fourier transform definition, given in 
Appendix [A] to replace the real space signal amplitudes s t by their 
Fourier counterparts s k . Introducing equation (T0| then allows us to 
rewrite equation ( [46} in terms of the power- spectrum coefficients 
Pk as: 
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P({Pk)\{Si}) 



c N2 n{Si})\ f } Q 



(47) 



where the determinant factorizes due to the diagonal form of the 
signal covariance matrix in Fourier space. 

Note, that due to isotropy the power- spectrum is independent 
of direction in Fourier space, meaning the power- spectrum coeffi- 
cients only depend on the modulus of the mode vector k k : 



P k = P(k k ) = P(\k k \). 



(48) 



For this reason, the angular dependence in Fourier space can be 
summed over. 

To do so we remark that the mode vector Jc k , as a geometrical 
object, will not change if we express it in the basis of cartesian 
coordinates k k = k k {k\,k 2 k ,k k ), or if we describe it in the basis of 
spherical coordinates k k = k k (\k k \, (p k , $ k ). We can therefore split the 
multi index summation into the summation over the three spherical 
coordinates as: 



r^2 N ~ l I- i2 

<£ y \skT_ 

Q2 Zj p k 

^ k=0 k 



- Z 



z 

i4i 

M-l 

z 



o-(\h\) 
P(\h\) 



(49) 



where we introduced cr(|4l) = C 2 /C 2 \s (fo|, (p k , ^)| , 

which is the summed signal power on spherical shells around the 
origin in Fourier space, and the index m labels each of the M shells 
belonging to the different mode vector modulus \k k \ in the Fourier 
box. 

Several different mode vectors k k may have the same vector 
modulus \k k \, and therefore belong to the same shell. To account for 
this we introduce the number n m , which counts the number of dif- 
ferent mode vectors k k , belonging to the mth shell in Fourier space. 
This number n m , therefore counts the degrees of freedom for each 
of the M modes. We can then express the product in equation ( |47| ) 
in terms of m as: 



f[(2n P k y 112 = Y](2nP m y nm/2 . 

k=0 m=0 

With these definitions equation ( |47] > turns to: 



P({Pk)\{Si}) 



C N2 P({ Si }) 



Y\{2nP m y 



>hn/2 



e 2 P m 



(50) 



(51) 



When ignoring the power- spectrum prior P({P k }) in the above 
equation (5T), we see that the probability distribution factorizes in 
the different P m , meaning they could be sampled independently. 
If also the prior for the different P m would factorize as: 



n{p k }) = Y\np m ), 



(52) 



then it is possible to sample each mode of the power- spectrum in- 
dependently. 

On large scales, or in the linear regime, the theory of gravita- 
tional structure formation tells us that the different Fourier modes 
evolve independent of each other. In these regimes the proposed 
power- spectrum prior would be the adequate choice. However, we 
also know that nonlinear structure formation couples the different 



Fourier modes, the stronger the deeper we reach into the nonlinear 
regime. In these regimes another prior would be more adequate, but 
also harder to sample. 

Anyhow, as already described in section [3] the entire power- 
spectrum sampling method requires two steps. While the differ- 
ent power- spectrum modes are assumed to be independent in the 
power- spectrum sampling step, they are not in the signal sampling 
step. There the different modes are coupled via the observation 
mask and selection function, and furthermore, the physical cou- 
pling of the different modes is represented in the data. 

Therefore, in the following we assume a power- spectrum 
prior, as proposed in equation (52), and defer a more thorough in- 
vestigation of adequate prior choices in the nonlinear regime to fu- 
ture work. 

With this prior choice each mode can be sampled indepen- 
dently from the following probability density distribution: 



P(P m \{Si}) = 



P(Pm) 



(c N2 P({si})) 



l/M 



(2xP m ) 



-n m /2 , 



(53) 



Further, we will assume a power-law behavior for the individual 
mode prior P(P m ) oc P~ a where or is a power law index. Note, that 
a power-law index a = describes the flat prior, while a - I 
amounts to the Jeffrey's prior. The Jeffrey's prior is a solution to 
a measure invariant scale transformation of the form P(P m )dP m = 
P(y Pm) ydP m ( Wandelt 2004), and therefore is a scale independent 
prior, as different scales have the same probability. 

Inserting this power law prior in equation (53) and imposing 
the correct normalization, reveals that the power- spectrum coeffi- 
cients have to be sampled from an inverse gamma distribution given 
as: 



P(Pm\{Si}) = 



(a-l)+n m /2 



1 



r((a-l)+ n f)P\ 



,(a+n m /2) 



1 > r m 
e 2Pm 



(54) 



By introducing the new variable x m = <r m /P m and performing 
the change of variables we yield the ^-distribution as: 



n/2-l 



P(x m \{si}) ■ 



r(p m /2) (2f< 



,12 



e 2 , 



(55) 



where J3 m = 2(a + n m /2 - 1). Sampling the power- spectrum coeffi- 
cients is now an easy task, as it reduces to drawing random samples 
from the x 1 -distribution. A random sample from the x 2 -distribution 
for an integer /3 m can be drawn as follows. 

Let Zj be j3 m independent, normally distributed random vari- 
ates with zero mean and unit variance then: 



(56) 



7=1 



is x 2 -distributed, and z m is a J3 m element vector, with each element 
being normally distributed. The power- spectrum coefficient sample 
is then obtained by: 



\U 2 



(57) 



It is easy to see that each spectrum coefficient sample is a positive 
quantity, this ensures that the signal covariance matrix is positive 
definite as it has to be by definition. 

To summarize, we provide an optimal estimator for the power- 
spectrum coefficients, and their uncertainties. 

It is also worth mentioning, that the inverse gamma distribu- 
tion is a highly non-Gaussian distribution, and that for this reason, 
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the joint estimates of signal amplitudes s t and power- spectrum co- 
efficients P m are drawn from a non-Gaussian distribution. 



5.2 Blackwell-Rao estimator 

As described in the introduction, we seek to estimate the probabil- 
ity distribution P({P m }\{di}), which we can now simply obtain by 
marginalizing over the signal samples: 



/d {sij 



^({P m )|{s i },{d i })n{s i }|{d i }) 



■ ; 



dfSijnfPmMSiKWfSiJKdi}) 

i N Gibbs 



NGibbs 



(58) 



7=1 



where {SiY are the signal Gibbs samples, and N Gibbs is the total 
number of Gibbs samples. 

This result is known as the Blackwell-Rao estimator of 
P({Pm}\{di}) which is guaranteed to have a lower variance than a 
binned estimator jWandelt|2004] ). 

It is worth noting, that P{[P m }\{Si\) has a very simple analytic 
form, and therefore equation ( [62] ) provides an analytic approxima- 
tion to P({P m }\{di}) based on the Gibbs samples. All the information 
on P([P m }\{di}) is therefore contained in the <r m of the individual 
Gibbs steps, which generate a data set of size 0(m max N Gibbs ), where 
tn max is the maximal number of independent modes. In addition, to 
being a faithful representation of P({P m }\{di}) the Blackwell-Rao 
estimator is also a computationally efficient representation, which 
allows to calculate any moment of P({P m }\{di}) as: 



(Pm Pm' •••Pm")\ e P(P m \{d i }) 



1 



Nabbs 



N G 



(PmPm' •••Pm")\ e P(P m \{s i }j) » (59) 



7=1 



where each of the terms on the right handside can be calculated 
analytically. 

For the inverse gamma distribution given in equation ( [54} we 
can then simply calculate the mean of the probability distribution 
P(P m \{di}) as: 



(P m )\p(p n 



\{di)) ~ 



1 Y NGibbs _j 
NGibbs ^7=1 °™ 



2(or -2) + n m 
and in analogy the variance as: 



(60) 



{[Pm-(Pm)f)\nP n 



\{di)) ' 



Ndbbs 



4 ((a - 2) + nJ2Y {{a - 3) + nJ2) 



•(61) 



The Blackwell-Rao estimator also allows us to demonstrate an- 
other remarkable property of the Gibbs sampling approach. Al- 
though a specific power-spectrum prior has to be employed dur- 
ing the Gibbs analysis of the data, a post processing analysis of 
the power- spectrum can be performed with any desired power- 
spectrum prior. Lets assume one prefers to perform a post process- 
ing analysis with a power- spectrum prior f"({P m }), rather than with 
the prior P({P m }), which was employed during Gibbs sampling. We 
therefore want to estimate the power-spectrum from the following 
posterior: 



P'dPJ) 
P({P m }) 



nidi}) 
n{p m )\{di)) 



p'({p m )) 
n{pj) 



^({PmillSiimiSillldi)) 



d{s i }p > ({p m }) n ^ } , l{P r } V ({s i }|{d i }) 



nisi)) 



N Gibbs 



N G 



7=1 



P{{Sj} j \{P m }) 
P({SiV) ' 



(62) 



where we simply made use of the Bayes theorem. Since 
P({si)\{P m }) is a simple Gaussian distribution, and therefore given 
analytically, the posterior P'{{P m }\{di}) can be calculated with any 
desired power- spectrum prior in a post-processing step. 



6 THE PRIOR AND THE COSMIC VARIANCE 

The Gibbs sampling procedure consists of the two basic steps, of 
first sampling perfect noise-less full sky signal samples s t and then 
sampling the power- spectrum coefficients P m given s t . 

Therefore, the probability distribution P(P m \{si}), given in 
equation ( [54| , encodes our knowledge on the power- spectrum co- 
efficients P m , if we had perfect knowledge of the true signal ampli- 
tudes Si. 

It is clear, that in the case of perfect observations the full pos- 
terior distribution for the power- spectrum coefficients P(P m \{di}) 
would reduce to that one given in equation ( |54| . 

This is, because in the case of perfect full-sky and noise-less 
observations, the signal posterior would collapse to a Dirac delta 
distribution, due to the vanishing noise covariance matrix. This 
means, that in this case the signal amplitudes s t can be estimated 
with zero variance. 

However, measuring the P m to arbitrary precision will never 
be possible. The power- spectrum coefficients depend on the data 
through the <x m , which measure the actual fluctuation power in the 
observed Universe. It is clear that the probability distribution func- 
tion ( [54] ) for the P m will not reduce to a Dirac delta distribution, 
even though the <x m have been measured perfectly. 

Owing to this fact, there will always remain some uncertainty 
in the power- spectrum estimation, even in the case of perfect mea- 
surements. This residual uncertainty is well known as cosmic vari- 
ance, which is the direct consequence of only observing just one 
specific matter field realization. 

The Gibbs sampling approach, as proposed here, takes this 
cosmic variance into account, by drawing samples from the prob- 
ability distribution P(P m \{Si}), which obey the correct statistical 
properties. 



6.1 Flat versus Jeffrey's prior 

The main characteristics of P(P m \{Si}) can be summarized in terms 
of the mean, mode and variance, which allows us to discuss the 
influence of the actual power law prior choice. The mode of the 
inverse gamma distribution ( [54} is the most frequently used esti- 
mator for the power- spectrum coefficients when using fast Fourier 
transform techniques. The mode P* n is defined as P* m € {P m |VP/ : 
P(Pi\{Si}) < P(P m \{Si})}, which is simply the value of P m which 
maximizes the distribution. For the inverse gamma distribution it is 
given as: 



PI 



(2a + n m ) 



(63) 



Assuming a flat prior a = 0, this immediately returns the frequently 
used and simple power- spectrum estimator: 
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Figure 3. Selection function and two dimensional sky mask. 



90 



!35 



p* = ^ 

n ' 

"/// 



(64) 



(see e.g. |Cui et al.|2008) . 

However, note, that a flat prior is a questionable choice, when 
measuring a variance, which is a scale parameter, as it does not cor- 
respond to maximal ignorance but biases towards large excursions 
from zero ( Wandelt 2004 ). Additionally the flat prior does not per- 
mit to sample every mode in the three dimensional Fourier box with 
finite variance, which can be easily seen by looking at the variance 
of the inverse gamma distribution given as: 



([Pm ~ (P m )f) ■■ 



4((a-2) + nJ2n(a-3) + nJ2) 



(65) 



which is only finite for 2a + n m > 6. 

In a three dimensional cubic Fourier box the minimal number 
n m for a mode is min(n m ) = 6 (except for the zero mode). This 
corresponds to the six mode vectors k k with same vector modulus 
\k k \ along the three axes in Fourier space. Nevertheless, a flat prior 
(a = 0) requires n m > 6 in order to sample the modes with finite 
variance, which cannot be fulfilled for these modes. 

Therefore, we favor the Jeffrey's prior with a = 1, which re- 
quires only n m > 4 to sample each mode, except for the zero mode, 
with finite variance. Jeffrey's prior is also scale invariant, and there- 
fore does not introduce any bias on a log-scale. 



6.2 Informative prior 

The prior discussed in the previous section is a maximal ignorance 
prior in the sense, that every scale has the same probability. This 
prior therefore allows for large excursions around the true value of 
the power- spectrum. This is especially important when sampling 
the largest scales in a galaxy survey, which are poorly constrained 
by measurements. A maximum ignorance prior will therefore re- 
quire to sample a huge space of possible power- spectrum configu- 
rations. 

However, one can argue, that knowledge about the largest 
scales exists, either through theory, or CMB measurements, which 
provide detailed information on the largest scales. 

For this reason, it might be interesting to incorporate this a 
priori knowledge on the power- spectrum into our sampling scheme, 
and therefore allowing for a more efficient strategy to sample the 
space of possible power- spectrum configurations. 

The most simple informative prior can be obtained by limiting 



the range of the Jeffrey's prior, by setting the Jeffrey's prior equal 
to zero for power- spectrum excursion of more than a certain factor: 



PJ for P p ™/T<P m <P p m 
otherwise 



(66) 



where Pf n rwr is our best guess power- spectrum prior, and r is a fac- 
tor, which permits a certain range around the prior power- spectrum. 
The sampling scheme, which arises by implementing this prior, is 
basically the same as described in section |54) with the only excep- 
tion that power specrum coefficients P m , which do not fulfill the 
requirement described in equation (66]), are rejected and have to be 
resampled. This prior would still be a maximum ignorance prior 
over the allowed range. 

Another possible informative prior, which allows to sample 
the entire range, but favoring the region, in which we expect the 
true power- spectrum to exist, can be very easily found by assum- 
ing some a priori knowledge on the <x m . For example, this can be 
achieved by incorporating an independent observation to the sam- 
pling scheme. In this case we can again assume an inverse gamma 
distribution for the power- spectrum prior: 



P(Pm) ■ 



r[(a p 



1)- 



n p n rior 
2 



\ P (a " 

I 1 m 



'72) 



(67) 



where o- p l rwr describes our a priori knowledge on the cr m , a Prwr 
is the spectral index of of our power law prior, which we choose 
a Prwr - iiq^q me Jeffrey's prior, and n^ 1 ^ is the number of mode 
counts for our prior guess. Note, that the combination of a Prwr and 
n p ^ wr defines how sharp this prior would be. As we want our prior 
to contain as little information as possible, we choose n p ^ wr = 5 as 
this is the minimal number of modes, which lead to a finite variance 
with the Jeffrey's prior. 

Introducing an inverse gamma prior will then yield again a 
inverse gamma distribution for the power- spectrum sampling pro- 
cedure: 



P(P m \{Si}) = 



U% ior +crm ^ 



(a Prior -l)+(n Prior +n m )/2 



e 2 p m 



\{a p 



1) - 



) \ p {a Prior +{n\ 
~ I * m 



P { ior +n m )/2) 



.(68) 



By introducing x m = (cr p l nor + <r m )/P m , this can again be rewritten 
as a ^-distribution: 



© 2009 RAS, MNRAS 000,[T]|26] 



Bayesian power-spectrum inference for Large Scale Structure data 1 1 



«/2-l 



P(x m \{ Si }) ■ 



(69) 



where p m = 2(a p ™ r + «™ r + n w )/2 - 1). 

A power- spectrum sample is then obtained in the same fashion 
as described in section [5TT] by 



P m = 



I4I 2 



(70) 



with z m being J3 m element vector, with each element being normally 
distributed with zero mean and unit variance. 

As an example of incorporating theoretical information one 
can generate the cr^ lor by: 

=n p „rp m , (7i) 

which will yield on average the prior power- spectrum. In this fash- 
ion the Gibbs sampler will sample around our prior guess for the 
power- spectrum. 

Note, that this method provides a possible interface for joint 
power- spectrum estimation from a joint CMB and large scale struc- 
ture analysis, where the cr^ wr = cr™ B will be obtained from a 
CMB analysis step. 



6.3 Hidden prior 

In the previous section, we described how to sample each mode 
\k k \ of the Fourier box individually, which yields extremely fine 
frequency resolution in the power- spectrum estimate. As in prac- 
tice such a high frequency resolution is not required, or desired, 
one allows each shell m! to have a finite thickness Ak' m , rather than 
treating it as infinitely thin. 

As the newly designated shells m' have a finite thickness, dif- 
ferent infinitely thin shells m now belong to the same shell m' . 

With the shells having a finite thickness, different close by 
Fourier modes \k k \ fall into the same bin, and therefore an assump- 
tion about the functional shape f m (\k k \), which yields the correct 
weighting, over this shell m' around the central mode |£| m / must be 
assumed: 

Pm ~ A m > f£ for k k e [\k\ m - AkJ2, \k\ m Ak m /2] , (72) 

where A m > is the constant amplitude for the m'th shell. Usually, the 
shape of the power- spectrum f£ is assumed to be constant over the 
shell width Ak' m , which amounts to "binning" the power- spectrum. 
However, in general fjjf could assume any desired functional shape. 

With this assumption, the estimate of the power- spectrum is 
further constrained by the assumed functional shape f£ , and we 
seek to estimate the set of power- spectrum coefficients {P m } from a 
probability distribution given as: 



P({P m }\{si}AC}) 



n{f:'}\{p m }) n{p m })msi}\{p m }, {/r')) 
WiK^i) nisi}) 
p({p m })p({si}\{p m }) 



mfz'}\{si}) 



msi\) 



(73) 



where we assumed conditional independence P({Si}\{P m }, [ f™' }) = 
P({Si}\{P m }) of the functional shape, once all power- spectrum coef- 
ficients are given. 

The usual implicit assumption when introducing this kind of 
power- spectrum binning is: 



nic'Wm}) oc Y\8 D (A' m c f -p m ), 



(74) 



meaning, we claim to know the exact functional shape of the power- 
spectrum within the shells m' . 

This reduces the amount of free parameters to be sampled, 
from the set of N m power- spectrum coefficients P m to the set of N m > 
amplitudes A m > . 

If, for instance, we knew the exact shape of the entire power- 
spectrum, sampling the power- spectrum could be reduced to the 
task of just sampling the overall amplitude. Such an approach is 
persued, when trying to sample the cosmological parameters. 

However, with the above definition, sampling the power- 
spectrum reduces to sample the amplitudes A m /, and we can write: 



P{{A m '}\{Si\AfZ}) 
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(75) 



where a m , = Zmem> o~m/fHt and n m' = Zmem' As this prob- 
ability distribution factorizes in the amplitudes A m > , each of these 
amplitudes can be independently sampled from the inverse gamma 
distribution, with the method described in section |5T| 

A power- spectrum coefficient P m is therefore obtained as: 



P m = 



rr , f m 

u m! Jm 



(76) 



where z m ' is a n m > component vector, with the elements being Gaus- 
sian distributed with zero mean and unit variance. 

Note, since n m > ^ n m the variance for the power- spectrum co- 
efficients P m is reduced. This is the result of reducing the amount of 
free parameters, by introducing "binning", as now each amplitude 
estimate for the A m > is based on more supporting points than the 
individual P m . 

Due to the finite shell width Ak m > neighboring modes are cou- 
pled. This fact could be exploited to circumvent the problem of 
missing mode coupling in the nonlinear regime. For example, if 
the different shells would be logarithmically spaced, it is possible 
to sample the largest scales independently, while towards the non- 
linear regime, the modes get more and more coupled. From a phys- 
ical point of view, the logarithmic spacing would therefore be best 
suited for this problem. On the other hand, introducing "binning" 
to the power- spectrum sampling procedure, makes the method in- 
sensitive to fluctuations an scales smaller than the shell width Ak m > , 
and a Ak m > should therefore be chosen carefully in order not to miss 
features we intend to recover. 

It is also important to note, that the variance in the power- 
spectrum sampling step defines the stepsize for the random walk, 
to sample the joint probability distribution of signal and power- 
spectrum. If the "binning" is unreasonable large, and therefore vari- 
ance is dramatically reduced, it takes much longer to explore the 
joint space of signal amplitudes s t and power- spectrum coefficients 
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Figure 4. Volume rendering of observed galaxy densities in two different projections. 



P m . For this reason, we prefer to sample with rather high spectral 
resolution for the power- spectrum. 



7 NUMERICAL IMPLEMENTATION 

Our numerical implementation of the large scale structure Gibbs 
sampler is called ARES (Algorithm for REconstruction and Sam- 
pling). It can be separated into the described two Gibbs sampling 
steps of estimating a signal sample, which involves solving large 
systems of equations, and sampling the power- spectrum, by draw- 
ing random samples from the inverse gamma distribution. 



7.1 Inversion of matrices 

The signal sampling step is by far the most numerically demand- 
ing step as it requires fast and efficient inversions of large matrices. 
ARES utilizes the fast operator based conjugate gradients inver- 
sion technique as presented in the ARGO code ( [Kitaura & E nfilin 
2008), which has recently been applied to Sloan Digital Sky Data, 
to obtain matter field reconstructions ( Kitaura et al. 2009). Opera- 
tor based inversion techniques have been previously developed for 
CMB data analysis ( |Wandelt|200"4l|Eriksen et al.|2004[| Jewell et al.| 
12004] ). 

Rather than requiring to store the matrices under considera- 
tion explicitely in computer memory, which would be numerically 
prohibitive, this approach only requires to know how these matrices 
would act on a vector, and therefore reduces the required amount 
of computer memory to numerically feasible amounts. Further, it is 
possible to reduce the scaling of the most expensive matrix opera- 
tion to that one of a fast Fourier transform. ARES uses the FFTW3 
library, which therefore reduces the scaling of the most expensive 
operation to 0(N log(N)) ( |Frigo & Johnson|2005j l. 

The FFTW3 library also incorporates the feature of parallel 
Fourier transforms, which allows for straight forward paralleliza- 
tion of our code. 



7.2 Random number generation 

Our random number generation relies on a pseudo random number 
generator as provided by the GNU scientific library (gsl) ( Galassi 
|et al.|2003| ). In particular, we use the Mersenne Twister MT19937, 
with 32-bit word length, as provided by the gsl_rng_mtl9937 rou- 
tine. 

The Mersenne Twister algorithm was designed for Monte 
Carlo simulations, where primarily good quality numbers and 
speed are decisive (Matsumoto & Nishimura 1998). It has been 



proven to have a period of 2 



19937 



1 , where for our application in 



practice we usually require only about 2 35 unique random numbers. 
Also note, that the very high order of dimensional equidistribution 
guarantees negligible serial correlation in the output sequence. The 
Mersenne Twister algorithm passed several tests for statistical ran- 
domness, including the Diehard tests. 



7.3 Parallelization 

Parallelization of the code is a crucial issue, since CPU time is the 
main limiting factor of our method. Even though the conjugate gra- 
dient method allows for very efficient matrix inversions, for a com- 
plete data analysis one has to perform many of these matrix inver- 
sions, and therefore the total prefactor of the algorithm increases. 

In figure [6] we show a typical progression for the wall clock 
times of the map making algorithm during the Gibbs sampling 
chain for a low resolution simulation with N V ox = 64 3 and a setup 
as described in section |8] 

As can be seen, the wall clock times may vary from Gibbs 
iteration to Gibbs iteration. 

This variation is due to complexity of the actual problem in 
the Gibbs iteration, which may require more conjugate gradients 
steps to reach the desired numerical accuracy. 

The average creation time for one Gibbs sample in this low 
resolution simulation is about 20 seconds. Where this test was run 
on a single Desktop CPU. Doubling the resolution will require 
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Figure 5. Test results from low resolution test run. The left panel shows the results of the Burn-in test for eight sequential £J as a function of Fourier modes k. 
The right panel displays results for the correlation coefficients Ci(n) of the correlation length test for all Fourier modes. 



roughly eight times longer to create a sample. Hence, a N V ox = 
128 3 simulation will require roughly 5 minutes and a N V ox = 512 3 
about 8 hours to create a single Gibbs sample. 

This immediately clarifies the need to parallelize the code, as 
usually tenth of thousands of Gibbs samples are required. 

There are in principle two different approaches to parallelize 
the code. 

The most demanding step in the Gibbs sampling chain is 
the map making process. One could therefore parallelize the map 
making algorithm, which in principle requires to parallelize the 
fast Fourier transform. The FFTW3 library provides parallelized 
fast Fourier transform procedures, and implementation of those is 
straight forward ( Frigo & Johnson 2005). However, optimal speed 
up cannot be achieved. The other approach relies on the fact that 
our method is a Monte Carlo process, and each CPU can therefore 
calculate its own Markov chain. In this fashion we gain optimal 
speed up and the possibility to initialize each chain with different 
initial guesses. 

The major difference between these two parallelization ap- 
proaches is, that with the first method one tries to calculate a rather 
long sampling chain, while the latter one produces many shorter 
chains. 

As we will see in the next section, successive Gibbs samples 
are highly correlated in the low signal to noise regime and produc- 
ing a larger number of independent samples requires longer chains. 
This problem, however, can be partially overcome by initializing 
each Markov chain with an independent power- spectrum guess. 



8 TESTING ARES 

In this section, we apply ARES to simulated mock observations, 
where the underlying dark matter signal is perfectly known. With 
these mock observations we will be able to demonstrate that the 
code produces results consistent with the theoretical expectation. In 



addition, we will gain insight in how the code may perform in real- 
world applications, when CPU time is limited. Therefore, we will 
set up Gaussian mock cases, designed to highlight some specific 
feature of the code. 



8.1 Setting up a Gaussian Mock observation 

For the cases studied in this section we set up a set of low reso- 
lution mock observations based on Gaussian random fields, which 
will allow us to calculate a large number of samples in reasonable 
computational times. 

These mock observations are generated on a three dimensional 
cartesian box with N side = 64, corresponding to N vox = 2621 44 
volume elements, and a box length of L = 1500 Mpc, with the 
observer positioned at the center. 

The mock observations are generated according to the pro- 
cedure described in section [43] with the underlying cosmological 
power- spectrum being calculated, with baryonic wiggles, following 
the prescription described in Eisenstein & Hu (1998) and Eisenstein 
|& Hu|p999] >. For these simulations we assumed a standard ACDM 
cosmology with the set of cosmological parameters (Q m = 0.24, 
Q A = 0.76, Q b = 0.04, h = 0.73, cr 8 = 0.74, n s = 1 ). 

To generate the uncorrected noise component, we assume a 
noise structure function = M t F t /A, with A = 8. Ox 10" 3 L 3 /N vox 
in voxel space. Then we draw random Poission samples via the 
procedure described in section [43] 

Note, that this is equivalent to drawing a galaxy distribution 
from the ensemble mean dark matter density, which has been ob- 
tained by averaging over all possible matter field realizations. 

The survey properties are described by the galaxy selection 
function F t and the observation Mask M/. The selection function is 
given by: 



Pi 



bY b/y *_ 



(77) 
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Figure 6. Wall clock times for a single map making step over 10000 Gibbs iterations. The red line denotes the average wall clock time for one matrix inversion 
step. 



where r t is the comoving distance from the observer to the center 
of the ith voxel. For our simulation we chose parameters b = 0.6, 
r = 500 Mpc and y = 2. 

In figure [3] we show the selection function together with the 
sky mask, which defines the observed regions in the sky. The two 
dimensional sky mask is given in sky coordinates of right ascension 
and declination. The projection of this two dimensional mask into 
the three dimensional volume yields the three dimensional mask 
M t . 

Two different projections of this generated mock galaxy sur- 
vey are presented in figure [4]to give a visual impression of the arti- 
ficial low resolution observation. 

8.2 Testing convergence and correlations 

The theory of Gibbs sampling states that the individual Gibbs sam- 
ples converge to being samples from the joint probability distribu- 
tion. However, the theory itself does not provide any criterion to 
detect when the samples start being samples from the joint prob- 
ability distribution. Therefore, this initial burn-in behavior has to 
be tested by experiments. Further we will follow a similar analysis 
as described in Eri ksen et al.| ( |2004] ), to estimate the convergence 
behavior and the correlation length of the Gibbs samples. 

This analysis is important, as it allows us to understand the 
performance of our code in real world applications, which is partic- 
ularly relevant for estimating how many Gibbs samples are required 
for an accurate power- spectrum estimation. 

We study the initial burn-in behavior by setting up a simple ex- 
periment, in which we set the initial guess for the power- spectrum 
to be ten times larger in amplitude than the true underlying power- 
spectrum. Then ARES is applied to the mock observation, as de- 
scribed in the previous section [8?7] to calculate a number of Gibbs 
iterations. The obtained power- spectrum samples P\ of the /th iter- 
ation are then compared to the true power- spectrum via: 

pi ptrue 

$ = ^h> (78) 

where the Pf ue are the true power- spectrum coefficients, with which 
the mock dark matter signal was simulated. 

The results for the £J are shown in figure p\ Here, we observe 



a systematic drift of the successive power-spectra towards the true 
underlying power- spectrum. Further we see that the initial burn-in 
phase, for the kind of setup as demonstrated here, is rather short. 
The algorithm requires about 30 Gibbs iteration after which we can 
assume the samples to be samples from the joint probability distri- 
bution. Also note, that in this test, we do not observe any particular 
hysteresis for the poorly constrained large scale modes, meaning 
they do not remain at their initially set values. This demonstrates 
the ability of the code to handle correctly the mode coupling intro- 
duced by the sky cut. 

However, it is clear that a poor initial guess invalidates a cer- 
tain number of samples, especially at large scales, where the uncer- 
tainty due to the sky mask dominates. For this reason, it is advan- 
tageous to initialize the Gibbs sampling chain with an initial guess, 
which is close to the true power- spectrum, to ensure short burn-in 
times. As can be seen in figure[5] any bad initial guess would reveal 
itself by a systematic drift in the Gibbs chain, and can therefore be 
detected easily. 

Next, we want to analyze the correlation of the individual 
Gibbs samples in the sequence. This is a crucial point, as it per- 
mits us to estimate the number of independent samples, which can 
be obtained from a Gibbs chain of given length. 

The correlation between sequential Gibbs samples can be best 
understood by reviewing the sampling algorithm. 

A Gibbs sample of the joint probability distribution of sig- 
nal and power- spectrum is obtained in two steps. In the first step 
a Wiener reconstruction is performed, based on the assumption of 
a given power- spectrum, and the power lost due to noise filtering, 
masks and selection effects is replaced by a fluctuation term. The 
signal obtained in this first sampling step mimics a full sky noise- 
less signal. It is clear, that the power- spectrum of this signal is de- 
termined by the data in the high signal-to-noise region and by the 
assumed power- spectrum in the low signal-to-noise region. 

In the second step the power- spectrum sample is generated, 
based on this full sky noise-less signal sample, obtained in the 
previous signal sampling step. The obtained power- spectrum then 
works as input power- spectrum to the next Gibbs step, and the iter- 
ation starts again. 

In this fashion the Gibbs sampler performs a random walk in 
the multi-dimensional space of signal map and power- spectrum. 
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Figure 7. The left panel shows the mode statistics for some selected Fourier modes. The green vertical line denotes the true underlying power- spectrum. 
The red line denotes the averaged mean obtained from the Gibbs chain, and the black vertical line denotes the power- spectrum of the specific matter field 
realization. The right panel shows the power- spectrum covariance matrix estimated from the 40,000 Gibbs samples. 



The stepsize of the power- spectrum sampling step is solely deter- 
mined by cosmic variance, and does not take into account the noise 
variance, as described in section |5J] 

However, we want to probe the full probability distribution, 
which includes both noise and cosmic variance. This difference 
does not matter in the high signal to noise regime, since there cos- 
mic variance will dominate the total variance, and for any practi- 
cal purposes all Gibbs samples will be uncorrelated in this regime. 
This, however, is not true in the low signal to noise regime. Since 
the random stepsize between two subsequent samples is determined 
only by the cosmic variance, and not by the much larger noise vari- 
ance, two sequential samples will be strongly correlated. In this 
case a longer Gibbs sequence is required to produce uncorrelated 
samples. 

Reducing the variance by introducing binning to the power- 
spectrum, as described in section [O] will lead to even longer cor- 
relation length in the Gibbs chain. This simply means, the joint 
probability distribution will be sampled with a finer resolution in 
power- spectrum space. 

We study this correlation effect by assuming the power- 
spectrum coefficients P/ of different modes / in the Gibbs chain 
to be independent and estimate their correlation in the chain by cal- 
culating the autocorrelation function: 



Ci{n) = 



iP\-(Pi) P\ +n -(Pi) 
\ ^VarPi ^VarPi 



(79) 



where n is the distance in the chain measured in iterations (for a 
similar discussion in case of the CMB see Eriksen et al. 2004 ). 

We can then define the correlation length of the Gibbs sam- 
pler as the distance in the chain n c beyond which the correlation 
coefficient C/(n) has dropped below 0.1. 

The results for the different modes / are presented in figure [5] 
As one can see, the vast majority of the different Fourier modes 
have a correlation length of about n c ~ 100 Gibbs iterations. 
The rest of the modes show increasingly longer correlation length, 
which increases towards the highest frequencies contained in the 



box. For this reason, the Nyquist frequency has the longest correla- 
tion length. 

Especially for the highest frequencies towards the Nyquist fre- 
quencies, noise domination is only a partial explanation for the 
long correlation length. The far more important fact is, that the 
Nyquist frequency can in general not be resolved properly by the 
data, for example the Nyquist frequency is contained only once in 
the Fourier box. For this reason, the variance increases towards 
the Nyquist frequency. Note, that this effect arises from the tech- 
nical implementation of the fast Fourier transform, which operates 
on a finite grid, and that we are in principle able to account for 
these technical effects of the analysis scheme itself. However, the 
long correlation length for the Nyquist frequencies will in general 
only provide a rare amount of independent estimates at the Nyquist 
frequency and therefore this must be taken into account in further 
analysis. 

It is clear, that this effect shifts to higher frequencies as soon 
as the resolution of our analysis scheme is increased. This can be 
observed in the high resolution analysis discussed in the next sec- 
tion [83] Also note, that this effect can be cured by introducing an 
informative prior which will greately reduce the correlation length 
at these frequencies, as shown later in section [84] 

To study the marginalized posterior P/ distributions in more 
detail we plot their histograms in figure [7] It is worth mentioning 
that none of them is even approximately Gaussian. 

Another crucial point to address is the question how well we 
were able to account for effects of the survey geometry. This in- 
formation is contained in the correlation structure of the estimates. 
Therefore, we can examine this effect by calculating the correlation 
matrix of the P/ estimates: 



Civ = 



Pi -(Pi) Pi>-(Pi>) 
^VarPi ^VarPi 



(80) 



where the ensemble averages are taken over 40, 000 Gibbs samples. 

We present the result in figure[7] It can be clearly seen that this 
correlation matrix has a very well defined diagonal structure, as ex- 
pected from theory. The highest off-diagonal correlations have been 
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Figure 8. 2D marginalized posterior densities. Each plot shows the full joint posterior of the data, integrated over all dimensions except for the two shown. 



measured to be less than 20%, and are found at the highest frequen- 
cies close to the Nyquist frequency. Figure[7]also shows a blue band 
of anti-correlation around the diagonal. This anti-correlation indi- 
cates that the power- spectrum frequency resolution is higher than 
supported by the data. Since the data is fixed, and the mask cou- 
ples neighbored Fourier modes, an increase in the power- spectrum 
amplitude P/ has to be compensated with a decrease in the neigh- 
boring power- spectrum amplitude P/+i to have a good fit to the 
data. It is therefore possible, in a post-processing step, to reduce 
the frequency resolution of the estimated power- spectra until the 
anti-correlation vanishes. This is the idea behind the running me- 
dian estimator, which will be presented in section [9] 

However, since the posterior distributions for the P/ are non- 
Gaussian, the two point correlations do not contain all information. 
For this reason, we also demonstrate the marginalized posterior dis- 
tribution for pairs of P/S in figure [8] where we also show examples 
of maximally correlated and maximally anti-correlated modes. 

Finally, we have plotted the full spectrum computed from our 
40,000 sample run in figure[9] As can be seen, we chose a very high 
frequency resolution to reduce the correlation length of the Gibbs 
sampler in the low signal to noise regime. The estimated ensemble 
mean power- spectrum follows the true underlying power- spectrum, 
in particular the baryonic features. Towards the large scales, the 
uncertainty increases, as expected, since due to survey geometry 
and selection effects these scales are only poorly constrained by 
the data. 



8.3 High resolution Simulation 

In the previous section, we performed a low resolution analysis to 
compute a sufficiently large set of samples to estimate the correla- 
tion behavior of our algorithm. However, such a large amount of 
samples is not necessarily required and computational time may be 
better invested in performing higher resolution analysis. Therefore, 
in this section we describe the results obtained from such a high 
resolution analysis. 

The setup for this test is the same as described in section [ITT] 



with the exception that here we use 25 6 3 voxels on an equidistant 
grid. 

The main limitation for this test is CPU time, as a single sam- 
pling step takes about one hour. For this reason, extremely long 
chains are not feasible. Hence, we run many, rather short, parallel 
Gibbs chains, as described in section [731 m this fashion we obtain 
an optimal speed up for this parallelization scheme. The Gibbs sam- 
pler was run over 24 independently initialized chains and provided 
a total of 4230 samples. 

We present the power- spectra obtained from the multiple- 
chain analysis in figure [9] As can be seen there is overall good 
agreement with the realization specific power- spectrum and the en- 
semble mean estimate found by the Gibbs sampler. We also do not 
observe a detectable bias in any parts of the spectrum. 

Towards the Nyquist frequency the uncertainty increases. This 
is expected, as towards the edges of the box, the number of modes 
decreases, and variance increases. Note, that in this fashion the 
method takes into account the uncertainty introduced by the anal- 
ysis scheme itself, for example the Fourier space discretization of 
the FFT. 



8.4 Testing an informative Prior 

When trying to analyze real galaxy surveys, one is faced with the 
situation that due to sky cuts and selection effects, usually less than 
30% of the volume is available for analysis. This affects mainly 
the estimate of the largest scales in the survey, as they are sparsely 
sampled, and therefore poorly constrained by the data. 

In this situation the Jeffrey's prior, as a maximal ignorance 
prior, allows for large excursions from the expected true underly- 
ing power- spectrum. Since the Jeffrey's prior provides equal prob- 
ability for all scales between zero and plus infinity, it also allows 
for power- spectrum values, which can be excluded on theoretical 
grounds, or by complementary experiments, which are more sensi- 
tive at the largest scales, such as CMB experiments. 

From a Bayesian point of view, one could argue, that in the 
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Figure 9. Power- spectrum estimates obtained from the Gibbs sampling chain for a low resolution (left panel) and a high resolution run (right panel). The 
black lines represent the ensemble mean of the sample set and the light gray and dark gray shaded regions denote the one and two sigma confidence regions 
respectively. Additionally we show the according input power-spectra. The blue line shows the cosmological power- spectrum from which the matter field 
realization was drawn, and the red line is the power- spectrum of this specific matter field realization. 



presence of a priori knowledge, a maximal ignorance prior is not 
the optimal choice. 

Rather than sampling the entire space of possible power- 
spectrum coefficients P m with equal probability, it would be ben- 
eficial to preferably sample the region in which we expect the true 
power- spectrum to exist and allowing for larger excursions with 
smaller probability. 

This would have the effect, that the region of interest would 
be sampled more densely, and therefore allowing for better power- 
spectrum estimates with the same amount of Gibbs samples. Also 
remember, that according to the discussion in section [5^2] the prior 
can be changed for any final post-processing analysis. 

As an informative prior can lead to a more efficient sampling 
strategy in the presence of a priori knowledge, in the following we 
test ARES when employing the inverse gamma prior as described 
in section lo\2l 

We base the inverse gamma prior on a flat power- spectrum 
guess, which was calculated according to Eisenste in & Hu| (l998 ) 
and |Eisenstein & Hu ( 1999 ), without the baryonic wiggles. We ex- 
plicitly do not incorporate the information on the baryonic fea- 
tures, to demonstrate that solely the data drives their estimate. Fur- 
ther, we choose n^ wr = 5, in order to make the prior sufficiently 
broad, while at the same time ensuring that it has finite variance. 

With this prior choice we repeat the standard testing procedure 
as described in section [8Jl 

At first we test the initial burn-in time, by starting with a 
power- spectrum which is a factor 10 higher in amplitude than the 
underlying true power- spectrum. 



The results for the according described in equation ( 78 1, are 
presented in figure [TO] It can be seen that the burn-in time is much 
shorter for the large scale modes, which are poorly constrained by 
the data. Also note, that the overall burn-in times for the power- 
spectra are comparable to those of the maximal ignorance prior 



case (see figure [5}, indicating that these modes are not influenced 
by the informative inverse gamma prior. 

The real advantage of the informative prior becomes obvious 
when analyzing the correlation length for the individual power- 
spectrum coefficients P m in the Gibbs chain. Again we used a low 
resolution 64 3 Voxel simulation in order to estimate the correlation 
coefficients, as given by equation {80} . The results for this test are 
presented in figure [T0| 

In comparison to figure [5]it is clear, that the informative prior 
has a positive influence on the correlation length, which in this test 
are maximally on the order of hundred Gibbs iterations. 

As discussed previously the long correlation length at the 
highest frequencies are mainly of technical nature, as the Nyquist 
frequencies cannot be properly represented in the finite Fourier box 
required for the FFT. This fact introduced artificial variance, which, 
however, our method can take into account. The informative prior 
helps in this situation, as it stabilizes these artificial fluctuations by 
prior information. 

In addition, we observed a much better numerical behavior of 
our method when employing the informative prior, as the code does 
not run as frequently into numerically extreme regimes as with the 
maximum ignorance prior. This leads to a faster convergence of the 
conjugate gradient algorithm towards the desired accuracy. 

Further, we also observe a better correction of survey geom- 
etry effects. The correlation function for the Pi is plotted in figure 
[TT| The maximal correlation between different P/ in this test was 
less than 10%, which is a clear improvement. 

For comparison we also plot the 2D marginalized posterior 
densities, for the maximally correlated and anticorrelated modes in 
the maximum ignorance case. 

Finally, we present the low and high resolution power- spectra 
for the informative prior in figure [12] Note, that our prior did 
not contain any information on the baryonic oscillations. As can 
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Figure 10. Same as figure[5]but for low resolution runs with the inverse gamma prior. 



be clearly seen, the baryonic features have nicely been recovered, 
demonstrating, that our informative prior provided much less infor- 
mation than contained in the data. 



8.5 Testing with galaxy mock catalogs 

In this section, we describe the application of ARES to a mock 
galaxy survey based on the Millennium run ( |De Lucia & Bl aizot 
12007] ). 

The intention of this exercise is two-fold. First we want to test 
ARES in a more realistic setup, where the intrinsic shot noise of the 
galaxy distribution is correlated with the underlying signal, which 
could not be tested with the Gaussian tests before. And second, 
we want to demonstrate that ARES is able to reconstruct the fully 
evolved non-linear matter distribution of the N-body simulation. 

This mock galaxy survey consists of a set of comoving galaxy 
coordinates distributed in a 500 Mpc box. To obtain a realistic sky 
observation from this full sky galaxy sample, we virtually observe 
these galaxies through the sky mask and according to the selec- 
tion function presented in figure [5] The discrete galaxy distribu- 
tion resulting from this mock observation is then sampled to a 128 3 
equidistant grid. 

To reduce gridding artifacts, such as aliasing power, we em- 
ploy a supersampling technique as proposed in Jasche et al. (2009). 
This allows us to accurately treat the mode coupling, and will yield 
a precision estimate of the power- spectra up to the highest frequen- 
cies contained in the box. 

Similarly to the method described in |8.3[ here we will run 
4 independently initialized chains. Further, we employ the maxi- 
mum ignorance Jeffrey's prior. The galaxy distribution of this mock 
galaxy catalog follows the fully evolved non-linear matter distribu- 
tion. Nevertheless, we initialize the Gibbs sampler with the linear 
power- spectrum. Then the initial burn-in period, of about 50 sam- 
ples, is required to reach the non-linear power- spectrum. The sys- 
tematic drift towards the correct power- spectrum is represented in 
figure[l3] This experiment nicely demonstrates that the Gibbs sam- 



pling approach is able to recover the non-linearities of the fully 
evolved matter density field. At this point it is important to note, 
that the Wiener filter is a linear operation on the data, and as such 
leaves the statistics of the data intact. This has been demonstrated 
by |Kitaura et al.|(2009) , where they show, that the statistics of the 
reconstructed density field is consistent with a log-normal distri- 
bution, as expected for a non-linearly evolved matter distribution. 
This discussion clarifies, that the Wiener filter, or the Gibbs sam- 
pling approach as presented in this work, is very well able to cap- 
ture the non-Gaussian characteristics of the density field. 

However, in case one would like to perform a higher resolution 
analysis, it would be advantageous to initialize the Gibbs chain with 
a nonlinear power- spectrum guess, to yield even shorter burn-in 
times. 

The ensemble averaged power- spectrum obtained from this 
run, together with the one and two sigma confidence regions, is 
also presented in figure [13] Here it can clearly be seen, that the 
recovered power- spectrum is consistent with the fully non-linearly 
evolved matter field. Towards the larger scales the uncertainty in- 
crease, which is due to the imposed survey geometry. 

So far, in all test, we have focussed only on the recovery of 
the power- spectrum, and ignored the sample of reconstructed den- 
sity fields. Since the Gibbs sampler also provides samples from the 
matter density field posterior, we are able to calculate any desired 
statistical summary for the matter field reconstructions. The abil- 
ity to provide uncertainty estimates for the recovered density fields 
will in general be valuable for further science based on the matter 
field estimates. For this reason, in figure [T4| we present the esti- 
mated mean and variance maps obtained from the 4000 Gibbs sam- 
ples. As can be seen, the variance map clearly captures the features 
of the survey geometry and selection effects. With the set of Gibbs 
samples being available, all joint uncertainties can easily be prop- 
agated to the finally estimated quantities, such as the gravitational 
potential or large scale cosmic flows, by applying the according op- 
eration to the individual matter field samples. The result of such a 
procedure then again yields a probability distribution in the final 
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quantity, enabling us to provide uncertainty information for these 
quantities. 



9 OPERATIONS ON THE SET OF GIBBS SAMPLES 

In this section, we present an example application operating on the 
set of Gibbs samples. 

The outcome of our Bayesian method is not a single estimate 
but a Gibbs sample representation of the full posterior probability 
distribution for the power- spectrum coefficients. We are therefore 
able to propagate all uncertainties to any final result, simply by 
applying a post processing step to all Gibbs samples. The result of 
such an operation would again yield a probability distribution of 
the estimated final quantity. 

As a simple demonstration, we apply a running median filter to 
the set of power- spectra, which will reduce the spectral resolution. 

It is known, that the median is a better estimator of the typical 
value of a sample than the mean when there are large extraneous 
outliers in the sample (Stuart & Ord 1994). For this reason, we 
choose the median to estimate the mode power in a given frequency 
bin Ak m . Such a bin can be chosen to be large enough to smooth 
out any fluctuation below a certain scale. In our specific case we 
vary the bin width Ak m with frequency, to allow for a logarithmic 
binning. 

The median P v m of a set of power- spectrum amplitudes {P m } 
contained within the frequency bin of width Ak m around the mode 
k m then satisfies the inequalities 

P(Pm<P V m )> \ (81) 

and 

<P{P m >Pl)> 1 -. (82) 

The running median is then evaluated for every frequency in the 
power- spectrum sample. 

We apply the running median to the set of power- spectrum 



samples obtained from the two Gaussian mock cases with the Jef- 
frey's and the inverse gamma prior. In doing so, we are able to cal- 
culate the mean and according uncertainty regions for the running 
median estimates. This effect has already been discussed in section 
|6.3| Since the reduction of frequency resolution also decreases the 
amount of free parameters, the total variance decreases as well. 

The results of this experiment are demonstrated in figure [15] 
As one can easily see, the running median estimates are much 
smoother than the according Gibbs estimates. Also the reduction 
of frequency resolution by the running median estimator yields 
smaller confidence regions. 

Finally we are interested in the recovery of the baryonic fea- 
tures in the power- spectrum. We therefore employ the common pro- 
cedure of dividing the measured power- spectrum by one without 
baryonic wiggles p™ wlggles \ We then obtain the wiggle function as: 

P 

j -wiggle _ m /gg\ 

•* m pnowiggles ' ^ ' 

* m 

Calculating the wiggle function for all Gibbs samples and apply- 
ing the running median estimator to the set of wiggle functions 
will yield the distribution of wiggle functions. We then estimate 
the mean and the according one and two sigma confidence regions. 
The result of this calculation is presented in figure [15] for the two 
Gaussian test cases. As expected, the variance towards the largest 
scales increases. Nevertheless, figure [T5] clearly demonstrates that 
the baryonic features have been recovered precisely by the Gibbs 
sampling approach. 

This example nicely demonstrates that the uncertainty estima- 
tion can easily be transported to any final quantity estimated from 
the set of Gibbs samples. 



10 CONCLUSION 

In this work, we presented ARES, a new Bayesian computer al- 
gorithm, designed to extract the full information on the two point 
statistics from any given probe of the three dimensional large scale 
structure. 
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Figure 12. Same as figure ?? but for the inverse gamma informative prior. 



The scientific aim of this algorithm is to provide an estimate 
of the power- spectrum posterior P({P(ki)}\{di}), conditional on a 
data set, while accounting and correcting for all possible sources 
of uncertainties, such as survey geometry, selection effects and bi- 
ases. This is achieved by exploring the power- spectrum posterior 
P ({P(ki)}\{di}) via a Gibbs sampling approach. 

While direct sampling from the power- spectrum posterior is 
not possible, it is possible to draw samples from the full joint pos- 
terior distribution P({P(ki)}, {^}|{^}) of the power- spectrum coef- 
ficients P(ki) and the three dimensional matter density contrast am- 
plitudes Si conditional on a given set of data points {<//}. 

The entire Gibbs sampling algorithm therefore consists of two 
basic sampling steps, in which first a full three dimensional Wiener 
reconstruction algorithm is applied to the data and then a power- 
spectrum is drawn from the inverse gamma distribution. In this 
fashion we obtain a set of power- spectrum and signal amplitude 
samples, which provide a full representation of the full joint pos- 
terior distribution P({P(ki)}, {^}|{^}). The scientific output of this 
Bayesian method therefore is not a single estimate but a complete 
probability distribution, enabling us to calculate any desired statis- 
tical summary such as the mean, mode or variance. 

We also demonstrated, that given a set of Gibbs samples, it 
is possible to provide an analytic approximation to the power- 
spectrum posterior P({P(ki)}\{di}). This Blackwell-Rao estimator 
has an analytically appealing form enabling us to calculate any de- 
sired moment of the probability distribution in a simple analytic 
way. 

In addition, since the full joint probability distribution is avail- 
able, it is easy to carefully propagate all uncertainties through to the 
result of further post-processing analysis steps, such as parameter 
estimation. 

In this work, we focused on thoroughly testing the perfor- 
mance and behavior of our method by applying it to simulated data 
with controlled properties. These tests were designed to highlight 
the problematic of survey geometry and selection effects, for the 



two cases of Gaussian random fields and a mock galaxy catalog 
based on the Millennium run. 

One of the main goals of these tests was to build up intuition 
on the phenomenological behavior of the Gibbs sampling algo- 
rithm, estimating particularly issues, such as the correlation length 
of the Gibbs chain, burn-in and convergence times. The result of 
these tests is of special relevance, as it shows how long the Gibbs 
sampling chain has to run in order to produce a sufficient amount 
of independent samples. 

Through these experiments we found that the longest corre- 
lation lengths are dominated by the poorly constrained Nyquist 
modes of the box, which can be easily alleviated by imposing some 
prior knowledge on these modes. In doing so we found that the 
maximal correlation length for the Gibbs chain was on the order of 
hundred Gibbs samples. Thus, creating a large number of indepen- 
dent samples in a full scale data analysis is numerically very well 
feasible. 

However, the most important result of these tests is, that our 
method is able to correct for artificial mode coupling due to the sur- 
vey geometry and selection effects. This was tested by examining 
the correlation structure of the Gibbs samples, which showed that 
the maximal residual correlation can be reduced to less than 10%, 
demonstrating that this method correctly accounts for geometry ef- 
fects. 

The application of ARES to a galaxy mock catalog, based on 
the Millennium run, demonstrated the ability of our method to cap- 
ture the characteristics of the fully nonlinearly evolved matter field. 
This is owed to the fact, that the Wiener filter is a linear operation 
on the data, and as such does not destroy the intrinsic statistical 
characteristics of the data set. 

Nevertheless, a full real data analysis of existing redshift sur- 
veys requires the treatment of additional systematic effects such 
as scale or luminosity dependent biases or redshift space distortion 
corrections, which we defer to future works. However, the Bayesian 
framework, as presented here, can take all these effects naturally 
into account and treats them statistically fully consistent. Beside 
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Figure 13. The left panel shows successive power- spectrum samples during the burn-in period together with the initial guess (black line). The right panel 
shows the estimated power- spectrum of the Gibbs sampling analysis. The blue line denotes the initial power- spectrum guess, the black curve is the ensemble 
mean power- spectrum and the light gray and dark gray shaded regions represent the one sigma two sigma confidence regions respectively. 
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Figure 15. Running median estimates of the power-spectra (upper panels) and the according wiggle functions (lower panels) for the set of Gibbs samples with 
the Jeffrey's prior (left panels), and the inverse gamma prior (right panels). The black lines represent the ensemble mean of the sample set and the light gray 
and dark gray shaded regions denote the one and two sigma confidence regions respectively. Additionally we show the according input power-spectra. The 
blue line shows the cosmological power- spectrum from which the matter field realization was drawn, and the red line is the power- spectrum of this specific 
matter field realization. 



the possibility to include various kinds of uncertainties, the Gibbs 
sampling approach also allows for a natural joint analysis of differ- 
ent data sets, taking into account the systematics of each individual 
data set. 



In summary, we showed that ARES is a highly flexible and 
adaptive machinery for large scale structure analysis, which is able 
to account for a large variety of systematic effects and uncertainties. 
For this reason, ARES has the potential to contribute to the era of 
precision cosmology. 
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APPENDIX A: DISCRETE FOURIER 
TRANSFORMATION 

we define the numerical Fourier transformation as: 

N-l 
7=0 

and the inverse Fourier transform 

N-l ¥ 
xj = cY J y k J" ik -t l =C J] W 



2njk- 



(Al) 



(A2) 



/c=0 



k=-a-\) 



The normalization coefficients C and C are chosen such that they 
fulfill the requirement: 



CC = 



N ' 



(A3) 



APPENDIX B: CHANGE TO FFT REPRESENTATION 

However, for computational convenience, we would rather like to 
consider the set of coefficients § ku which are the matrix coefficients 
of the Fourier transformed signal covariance matrix S . The signal 
covariance matrix coefficients are related to the Fourier matrix 

coefficients Ski via the discrete Fourier transform defined in Ap- 
pendix [a] 



Su 



N-l N-l 

k=0 1=0 
N-l N-l 



(Bl) 



where N is the number of Volume cells which we consider in 
our calculation, and C is the normalization of the discrete Fourier 
Transform. The matrix elements of the Jacobi matrix for this coor- 
dinate transformation can then be written as: 



SF(ij)(mn) — 



dSi 



- C 2 II 



k=0 1=0 



dS„ 



k=0 1=0 
q2 e 2n^-(im-jn) 

C 2 A(ij) (ran) , 



2n^(ik-jl)cK cK 
°km°ln 



(B2) 
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where we defined the transformation matrix A, 
e 2n^-{im-jn) _ norm of the Jacobi determinant can then be cal- 
culated in matrix notation as: 

|det(J)| = |det(c 2 A)| 

= |det(c 2 I N 4 A)| 

= |det (C 2 I N4 )| |det (A)| 

= C 2iv4 |det(A)| , (B3) 

where we made use of the fact, that the Jacobi matrix is a quadratic 
N 2 x N 2 matrix, which is true for usual Fast Fourier Transform 
implementations. Since A is a unitary matrix: 



(A(ij)(mn)) 



A (mn)(ij) 



= e -2*^(mi-nj) 
~ A {ij)(mn)' 



(B4) 



the norm of the determinant det (A) is unity. Therefore, the norm of 
the Jacobi determinant is just a constant: 



|detCJ)| = C : 



-2iV 4 



(B5) 



With this definition we can perform the change of coordinates 
and express the probability distribution given in equation (??) in 
terms of the set of the matrix coefficients §ki- 

n{hi)\{si}) = c 2N4 n{Sij}\{Si}) = c 2N4 ^^p({ Si }\{§ki})Am 

where V{{S k {\) is the signal covariance matrix prior in Fourier 
space, and P({si}\{§ a}) is given as: 



P({si}\{S k i}) = <P({si)\S) 
1 



Vdet(2^-S) 
1 

Vdet(2;rs/ 
1 

Vdet(27rS)~ 
1 

Vdet(2;rs/ 



-IXiZjSiStfhj 



• 2 y.^-iy.^- 1 e^^r^-^Lls 



lI.iI.jStC ZSZf^e' 



_C1 yN-l yN-l y. s . e 2n ^ ik §, } V . v . 



C±_ yN-l yN-l ? * a ; * 
C2 ^k=0 ^Z=0 S k^kl s l 



with Si being the signal coefficients of the discrete Fourier trans- 
formed signal. To determine the determinant det (2ttS) in Fourier 
space, we rewrite equation \B2\ as a matrix product: 



k=0 1=0 
N-l N-l 



J^J^CAiJuCAl 



(B8) 



where we defined the unitary matrix A ik = e 2mk n . The determi- 
nant can then be evaluated in matrix notation as: 

det(2^S) = det(2CTC 2 A§A -1 j 

= det|c 2 I N2 A2^§A- 1 j 

= det (C 2 I N 2 ) det (A) det (inh j det (A" 1 ) 

L^ttS). (B9) 



= C lN det 



APPENDIX C: WIENER VARIANCE 

In section |4~3] we described, that a signal sample can be obtained by 
adding a fluctuation term y t to the Wiener mean reconstruction m,-. 
According to equations (36) and ( ftp) we can rewrite the fluctuation 
term as: 

y, = sj -J^DJ^J^^jN^d! 

j m l 

= s *~y D.; y Y k ^ I Z K,k + < 

j m l V k J 
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From this we immediately see that the mean of yf. 



(C2) 



vanishes by construction, as (s* k ) = (ej) = 0. Note, that the signal 
and noise covariance are given by: 



{s*s*) = S i j 
and 



(C3) 



(C4) 



Also note that , by construction, there is no correlation between the 
two random variates s* and e* as they have been created by two 
independent random processes. Therefore, we yield: 

<*?<> = 0. (C5) 

Then we can calculate the variance as: 

<y& 9 ) = (e d u E s * z - E E K ^' e ' E D ~i- E 5 " < - E E ) 

* 7 L £ ml J r L s t u J' 

= E D W E <*i + E K mj NjK, r N; u \e; e :) 

jr L £s m/?M 

= E E S ^ + E K^K^N, 

jr L £s m/?M 

= E^ir 1 Z s ^ + Z*^<*<^ 

7> L m/f 

= J^DjD-J S-j} +2 J K mJ N£K l 

jr L ml 

= Z D « lz £ D * 



Therefore, the additive fluctuation term provides the correct vari- 
ance. 



(C6) 
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